Shift-curvature, SGD, and generalization
نویسندگان
چکیده
Abstract A longstanding debate surrounds the related hypotheses that low-curvature minima generalize better, and stochastic gradient descent (SGD) discourages curvature. We offer a more complete nuanced view in support of both hypotheses. First, we show curvature harms test performance through two new mechanisms, shift-curvature bias-curvature, addition to known parameter-covariance mechanism. The shift refers difference between train local minima, bias covariance are those parameter distribution. These three curvature-mediated contributions reparametrization-invariant even though itself is not. Although unknown at training time, as well other mechanisms can still be mitigated by minimizing overall Second, derive new, explicit SGD steady-state distribution showing optimizes an effective potential but different from loss, noise mediates trade-off low-loss versus regions this potential. Third, combining our analysis with steady state shows for small noise, dominant mechanisms. Our experiments demonstrate significant impact on further explore relationship
منابع مشابه
Theory of Deep Learning III: Generalization Properties of SGD
In Theory III we characterize with a mix of theory and experiments the consistency and generalization properties of deep convolutional networks trained with Stochastic Gradient Descent in classification tasks. A present perceived puzzle is that deep networks show good predicitve performance when overparametrization relative to the number of training data suggests overfitting. We describe an exp...
متن کاملImproving Generalization Performance by Switching from Adam to SGD
Despite superior training outcomes, adaptive optimization methods such as Adam, Adagrad or RMSprop have been found to generalize poorly compared to Stochastic gradient descent (SGD). These methods tend to perform well in the initial portion of training but are outperformed by SGD at later stages of training. We investigate a hybrid strategy that begins training with an adaptive method and switc...
متن کاملSpatial Peak Shift and Generalization in Pigeons
How pigeons generalize across spatial locations was examined in the 4 experiments reported in this article. During training, a square was presented at a fixed height at 1 of 2 horizontal locations on a monitor screen. One location (S +) signaled reward, whereas the other one (S ) signaled no reward. The birds were then tested occasionally with a range of locations. After training with S+ only, ...
متن کاملTemporal generalization and peak shift in humans.
Three experiments investigated temporal generalization in humans. In Experiment 1, a peak shift effect was produced when participants were given intradimensional discrimination training. In Experiment 2, after training with a standard S+ and generalization testing with an asymmetrical series of durations, generalization gradients moved toward the prevailing adaptation level. In Experiment 3, ge...
متن کاملSpatial generalization and peak shift in humans
Using a computer betting game, five experiments tested university students on spatial generalization and peak shift. On each trial, one location was marked and the subject was invited to bet 0–4 points. At the winning location (S+), bets won four times the points betted. At nearby losing locations (S)s), points betted were lost. Generalization gradients were exponential in shape, supporting She...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Machine learning: science and technology
سال: 2022
ISSN: ['2632-2153']
DOI: https://doi.org/10.1088/2632-2153/ac92c4